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Abstract 

A graph based matching is used to construct aggregation based coarsening for algebraic multigrid method. 
Effects of inexact coarse grid solve is analyzed numerically for a highly discontinuous convection-diffusion 
coefficient matrix, and for problems from Florida matrix market collection. The proposed strategy is found 
to be more robust compared to a classical AMG approach. 



1 Introduction 

We concern ourselves with the problem of solving large sparse linear system of the form 

Ax = b, (1) 
arising from the cell centered finite volume discretization of the convection diffusion equation as follows 

div(a(a;)w) — div(K(a;)Vw) = / in fi, 

u = on dfl D , (2) 
du 

— — on oQn, 
on 

where 51 = [0,1]" (n — 2, or 3), 8Hn = dil \ dilo- The vector field a and the tensor k are the given 
coefficients of the partial differential operator. In 2D case, we have dSlo — [0, 1] x {0, 1}, and in 3D case, 
we have dSlo = [0, 1] x {0, 1} x [0, 1]. Other sources include problems from Florida matrix market collection 
[9]; see Table for a list of problems considered in this paper. 



Currently, one of the most successful methods for these problems are the multigrid methods (MG) [21, 251126). 
The robustness of the multigrid method is significantly improved when they are used as a preconditioner 
in the Krylov subspace method [22j . If B denotes the MG preconditioner then the preconditioned linear 
system is the transformation of the linear system (JXJ) to B~ x Ax — B~ x b. Here B is a preconditioner which 
an approximation to the matrix A such that the spectrum of B~ x A is "favourable" for the faster convergence 
of Krylov methods. For instance, if the eigenvalues are clustered and are sufficiently close to one, then a 
fast convergence is observed in practice. Furthermore, the preconditioner B should be cheap to build and 
apply. With the advent of modern day multiprocessor and multicore era, the proposed method should have 
sufficient parallelism as well. 

In multigrid like methods, the problem is solved using a hierarchy of discretizations; the finest grid is at 
the top of the hierarchy followed by coarser grids. The two complementary processes are: smoothcning 
and coarse grid correction. The smoothers are usually chosen to be the classical relaxation methods such 
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as Jacobi, Gauss-Seidel, or incomplete LU methods [22 . Analysis for model problems reveals that the 
smoothers efficiently eliminates the low frequency part of the error, while the global correction which is 
obtained by solving a restricted problem on the coarser grid damps the high frequency part of the error 
[30) . In fact, low frequency errors on the fine grid becomes high frequency error on the coarse grid leading 
to their efficient resolution on the coarser grid. It is therefore crucial to choose efficient smoothers and a 
coarse grid solver. The classical geometric multigrid methods require informations on the grid geometry and 
constructs a restriction operator and a coarse grid. Since, a geometric multigrid method is closely related to 
the grid, the problem with nonlinearity can be resolved efficiently. But, for a complex grid, the applicability 
of the method becomes increasingly difficult. On contrary, algebraic multigrid method defines the necessary 
ingredients based solely on the coefficient matrix. Much research have been devoted to algebraic multigrid 
methods and several variants exists. 

In this paper, an aggregation based algebraic multigrid is proposed. The aggregation is based on graph 
matching. This is achieved by partitioning the graph of the matrix such that the partitioned subgraphs 
are assumed to be the aggregates. Once a set of aggregates is defined, the coarse grid is constructed from 
the Galerkin formula. In authors use graph partitioner to form aggregates, and forward Gauss-Seidel 
with downwind numbering is used as pre- and post-smoother with the usual recursive multigrid method, 
where the coarsening is continued untill the number of unknowns in the coarse grid are less than 10. This 
approach may lead to a deep hierarchy of grids, thus making the method very recursive and less adapted 
to modern day multi-processor or multi-core environment. In [2Q], similar graph based matching is used to 
form a coarse grid, and the classical recursive smoothed AMG approach is followed, however, here, ILUT 
[22) is used for pre- and post-smoothing. 

Our aim in this work is to propose a strategy that tries to avoid deep recursion but combines several different 
approaches as above. The strategy we adopt has the following ingredients: 

• Coarsening based on graph matching 

• ILU(O) is the smoother with natural or nested dissection reordering 

• Coarse grid equation is solved inexactly using ILUT 

We show that the strategy proposed above is simple, easy to implement, and works well in practice for 
symmetric positive definite systems with large jumps in the coefficients. Solving a coarse grid inexactly leads 
to a faster and cheaper method. Indeed, a parallel incomplete coarse grid solve will be desirable, however, 
in this work, we consider only the sequential version. We provide an estimate of heurestic coarse grid size 
and an estimate of a parameter involved in the inexact coarse grid solve. We compare our approach with a 
classical AMG [19] with Gauss-Seidel smoothing and exact coarse grid solve. 

The rest of the paper is organized as follows. In section (2), we discuss the classical coarsening strategy 
based on strength of connection, and the one based on graph matching. The numerical experiments are 
presented in section (3); the proposed method is compared with a classical AMG method on discontinous 
convection-diffusion problems and some problems from Florida matrix market collection [9] . Finally, section 
(4) concludes the paper. 



2 Graph matching based aggregation for AMG 

In a typical two grid method, there are two complementary processes namely, a smoother and a coarse grid 
correction step. When this strategy is repeated by creating another coarser grid, then the method is known 
as multigrid. During a fixed point iterative process, the high frequency components of the error or the so 
called rough part are dealt with efficiently by a smoother. On the other hand, the low frequency components 
of the error can only with dealt with globally by a method that is "connected" globally (pertains to global 
fine grid). We imagine the coefficients of the matrix as an approximation to some function (Jacobian in 
case of nonlinear iteration). For an example, assuming that a linear function is sufficiently smooth, an 
approximation to the function with N discrete points is close to an approximation to the function with only 
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N/2 (or even N/4) discrete points. Thus, we solve the problem cheaply with N/2 grid points (i.e, on a 
coarser grid) and then we interpolate the solution to obtain an approximation to the problem defined on 
N grid points. The error in the solution thus obtained has rough components because they were not taken 
into account properly while solving with the coarse solver, and this is where smoother comes into play. This 
interplay of smoother and coarse grid correction are complementary. For a more rigorous explanation, an 
inclined reader is referred to [21[ 1251 [26j where tools from Fourier analysis is used to explain why smoothening 
and coarse grid correction step works effectively for some problems. 

In classical AMG, a set of coarse grid unknowns is selected and the matrix entries are used to build inter- 
polation rules that define the prolongation matrix P, and the coarse grid matrix A c is computed from the 
following Galerkin formula 

A c = P T AP. (3) 

In contrast to the classical AMG approach, in aggregation based multigrid, first a set of aggregates Gi are 
defined. Let N c be the number of such aggregates, then the interpolation matrix P is defined as follows 

p [l. ifieG h 
[0, otherwise, 

Here, 1 < i < N, 1 < j < N c , N being the size of the original coefficient matrix A. Further, we assume that 
the aggregates Gi are such that 

G t n Gj = <j>, for i ^ j and U 4 G; = [1, N] (4) 

Here [1, N] denotes the set of integers from 1 to N. Notice that the matrix P defined above is an N x N c 
matrix, but since it has only one non-zero entry (which are "one" ) per row, the matrix can be defined by a 
single array containing the indices of the non-zero entries. The coarse grid matrix A c may be computed as 
follows 

(A c )ij = J2 akl 
where 1 < i, j < N c , and au is the (k, l)th entry of A. 

Numerous aggregation schemes have been proposed in the literature, but in this paper we consider two of 
the aggregation schemes as follows 
Aggregation based on strength of connection: This approach is closely related to the classical AMG 

[2 6) where one first defines the set of nodes Si to which i is strongly negatively coupled, using the 

Strong/Weak coupling threshold (i: 

Si = {j ^ i | fly < -(3 max|a ifc | }. 

Then an unmarked node i is chosen such that priority is given to the node with minimal Mi, here 
Mi being the number of unmarked nodes that are strongly negatively coupled to i. For a complete 
algorithm of the coarsening, the reader is referred to [19]. 

Aggregation based on graph matching: Several graph partitioning methods exists, notably, in soft- 
ware form [T2] dH [23] ■ Aggregation for AMG is created by calling a graph partitioner with required 
number of aggregates as an input. The subgraph being partitioned are considered as aggregates. For 
instance, in this paper we use this approach by giving a call to the METIS graph partitioner routine 
METIS_PartGraphKway with the graph of the matrix and number of partitions as input parameters. 
The partitioning information is obtained in the output argument "part" . The part array maps a given 
node to its partition, i.e., part(z) = j means that the node i is mapped to the jth partition. In fact, 
the part array essentially determines the interpolation operator P. For instance, we observe that the 
"part" array is a discrete many to one map. Thus, the ith aggregate Gi — part^ 1 (i), where 

part" 1 ^) = {j e [1, N] | part(j)=*} 

Such graph matching techniques were explored in [5] [TT] [5D] . For notational convenience, the method 
introduced in this paper will be called GMG (Graph matching based aggregation MultiGrid). 
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Let S denote the matrix which acts as a smoother in GMG method. The usual choice of S is a Gauss-Siedel 
preconditioner [32]. However, in this paper we choose ILU(O) as a smoother, we find that the choice of 
ILU(O) as a smoother gives more robustness compared to Gauss-Siedel method, however, at an additional 
storage cost. Another aspect that we explore is to use only two grid approach but with an incomplete coarse 
grid solve. That is, we use an incomplete ILU(t), where t is the tolerance for dropping the entries, see |22j . 
The approximation A c of the coarse grid operator A c is given as follows 

A c = L C U C , where, [L c , U c ] = ILUT(A C ) 

where ILUT stands for ILU(i). The reason for using only two grid, and using an incomplete (and possibly 
parallel) coarse grid solve is to avoid the recursion in the the typical AMG method. It may be profitable 
to solve the coarse grid problem in parallel and inexactly, when the problem size becomes large. This may 
be achieved by a call to one of the several hybrid incomplete solvers based on ILU [2] like approximation or 
by using a sparse approximate inverse [S] . The investigation with the parallel inexact approximation of the 
coarse solver will be done in future, and in this paper, we shall understand the qualitative behavior such as 
the convergence and robustness of the proposed strategy compared to a classical AMG approach found in 

m 

Let M = PA C P T denote the coarse grid operator interpolated to fine grid, then the two-grid preconditioner 
without post-smoothing is defined as follows 

B = (S~ 1 +M- 1 -M-i-AS- 1 )- 1 . (5) 

We notice that M _1 ss PA f T 1 P T , thus, an equation of the form Mx = y is solved by first restricting y 
to y c — P T y, then solving with the coarse matrix A c the following linear system: A c x c — y c - Finally, 
prolongating the coarse grid solution x c to x = Px c . Following diagram illustrates the two-grid hierarchy. 



Restrict y to y c :— P y 



Prolongate x c to x :— Px c 



Solvc:A c :c c — y c 



The preconditioner B is similar to the combination preconditioner defined in [TJ [13] , where instead of defining 
a coarse grid operator a deflation preconditioner is used. Thus, rather than satisfying a "filtering property" , 
the coarse grid operator satisfies the following "approximate filtering condition (AFC)" 

AP&MP, (see Theorem flU on page©, 

where columns of interpolation matrix P spans a subspace of dimension N c . Here, we have considered 
the exact coarse grid solve, the inexact version is similar to the exact two-grid preconditioner ([5]) defined 
above except that M is replaced by M — PA C P T , and we denote the inexact two grid preconditioner by 
B. In Algorithm fl}, we present the complete iterative algorithm for the inexact case; the algorithm is 
essentially a slightly modified form of algorithm presented in Figure (2.6) in [4]. The two-grid methods can 
also be integrated in a similar way in an iterative accelerator other than GMRES, to integrate with other 
accelerators, see [4]. 



2.1 Analysis of graph based two-grid method 

For any matrix K , let K y denote that the matrix K is symmetric positive definite and we use the notation 
K(:,j) to denote the jth column of K, whereas, K(j, :) to denote the jth row of K . If A >~ 0, then the inner 
product ( , )a defined by (u,v)a — u T Av is a well defined inner product, and it induces the energy norm 

1/2 

| \\a defined by \\v\\a = [v, v)J for any vector v. A matrix K is called A-selfadjoint if 

(Ku,v) = {u 1 Kv) A , 
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Algorithm 1 PSEUDOCODE TO SOLVE Ax = b, A e R NxN , x,beR N 

OBJECTIVE: To solve Ax j b 

SETUP PHASE 

Call graph partitioner to get partitions in an array, say, part. 

Use part array to form aggregates d and the prolongation matrix P (subgraphs are aggregates) 
Create coarse grid matrix A c G R^ cXjVc as follows 

{A c )ij = ^ X) a ki- 

Factor the coarse grid matrix inexactly: A c = ILUT(A C ). Here ILUT is incomplete LU with tolerance. 
Setup smoother: S = L n Uo = ILUO(A). Here ILUO is incomplete LU with zero fill-in 

Define (not to be formed explicitely) two-grid preconditioner B and M as follows 

B = (S- 1 + M" 1 - M~ 1 AS~ 1 )~ 1 , M = PA C P T 

PRECONDITIONED GMRES ITERATION 

x is an initial guess 
for j = l,2,... do 

Solve r from Br = b — Axq (Sec SOLVE Bq = z function below) 
v (i) _ r /||r|| 2 , s := ||r|| 2 ei 
for i = 1, 2, . . . , m do 

Solve w from Bw = Av^ (See SOLVE Bq — z function below) 
for fc = 1, . . . , i do 

hk,% = (w, w (fc) ), w = w-hk,ivW 
end for 

hi+i,i = \\w\\ 2 , = w/h i+lii 

apply Ji,...,Ji_i on . . . , 

construct J,, acting on the ith and (i + l)st component of /i.^, 
such that (i + l)st component of Jj/i.,, is 
set s := JiS 

if s(i + 1) is small enough then 

UPDATE^, i) and quit 
end if 
end for 
UPDATE(x, to) 
end for 

UPDATE(.t, to) 

Solve for y in Hy — s. Here upper ixi part of H has ft^j as its element, s represents the first i components 
of s 

x = x^ + yi v^ + y 2 v^ + ■■■ + W «W , s^+V = \\b - Ax\\ 2 

If x is accurate enough then quit else = x 
SOLVE Bq = z 

Solve St = z (use S = L U ), solve Mf = z (See SOLVE Mg = h function below), solve Mq = At, set 
q=t+f-q 

SOLVE Mg = h 

set h c = P T h, Solve A c g c = h c (use A c = LU), g — Pg c 
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or equivalently if 

A~ X K T A = K. 

For any matrix K, let span(if ) denote a set of all possible linear combination of the columns of the matrix 
K. Let ||a;|j denote the Euclidean norm (X)"=i a; i) 3 - I n what follows, we assume that the matrix P is 
orthonormalized, such that P T P = I. The basic linear fixed point method for solving the linear systems ([T]) 
is given as follows 

x n+1 = x n + B- X {f - At") = (J - B- l A)u n + B~ l b 

Subtracting the equation above with the identity x — x — B~ 1 Ax + B~ 1 b yields the following equation for 
the error e n+1 = u — u n 

e n+1 = (I - B- l A)e n = (I- B^Afe"' 1 =••• = (/- B~ 1 A) n+1 e°. 

Choosing B as in Equation ([5]), we have the following relation 

e n+i = g_ s^AY'+^I - M _1 A) n+1 e° 

Thus, the quality of the preconditioner B depends on how well the smoother S and the coarse grid precon- 
ditioner M acts on the error. 

In [TJ H31 HZl [Ml , a composite preconditioner similar to the one in ([5]) is proposed, where, the matrix M 
is replaced by a preconditioner, say, Mf that deflates the eigenvector corresponding to smaller eigenvalue. 
The preconditioner Mf is constructed such that it satisfies a "filtering property" as follows 

Mft = At, 

where t is a filter vector. In [T], the filter vector is choosen to be a Ritz vector corresponding to smallest 
Ritz value in magnitude obtained after a couple of iterations of ILU(O) preconditioned matrix. In [27] . first 
the iteration is started with a fixed set of filter vector, and later the filter vector is changed adaptively using 
error vector. In [13], authors fixed the filter vector to be a vector of all ones and show that the composite 
preconditioner is efficient for a range of convection-diffusion type problems. In brief, for an effective method, 
the columns of the interpolation matrix P should approximate well the eigenvectors corresponding to low 
eigenvalues. One possibility is to use P to construct a deflation preconditioner as shown in [17] . 

The following theorem shows that preconditioner M that corresponds to a coarse grid correction step satisfies 
a more general but approximate filtering condition. 

Theorem 1. // the coarse grid correction preconditioner M = PA C P T and the coarse grid operator A c = 
P T AP are nonsingular, then following holds: 

1. MP &AP 

2. If t = [1,1, ... , 1], then t € span(P) and dim(span(P)) = N c 
Proof. We have 

(I - M~ 1 A)P « P - PA c r x P T AP 
= P- PA c r x A c 
= 

From equations and ((U) , we find that P is an N x N c matrix, and the j th column of the matrix P has a 
non-zero entry P(i, j) — 1 if and only if i G Gj. Since the aggregates G'^s cover all the nodes in the set [1, N], 
for all i 6 [1, N], there exists an aggregate Gj such that i = Gj(k) for some k, and consequently P(i,j) = 1. 
Moreover, since the aggregates G'^s do not intersect, such j is unique. In other words, for each i g [1, iV], 
there exists one and only one column P(:, j) of P such that the ith entry of column P{'-,j) is 1. Hence we 
have 

Pt= p (;j) = t 

l<i<N c 

and since each columns of P are linearly independent we have dim(span(P)) = N c . □ 
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Theorem 2. If A)- 0, then A c )- 0. 



Proof. We have A c = P 1 AP and 

(A c x, x) = (P T APx, x), for x ^ 
= (APx,Px), for x ^ 
> 0, for x ^ 0. 

Notice that we use the fact that P is a boolean matrix, i.e., it has one and only one non-zero entry equal to 
"one" per row. Thus, Px ^ for x ^ 0, x £ i?^. Hence the theorem. □ 

However, the global preconditioner corresponding to the coarse grid solve represented by M = PA C P T or 
M~ x A is not necessarily SPD. We have the following counter examples 

Theorem 3. A >~ does not imply that M >- or M~ 1 A >- 0. 

Proof. Let N = 4 be the size of A. Let there be two aggregates, G\ = {1,3} and G2 = {2,4}, then the 



10 10 
10 1 



choosing x = [1,0,-1,0], we have 



restriction operator P T is defined as follows P T — 
P T x = 0. Thus, we have 

(Mx, x) ^ 0, (M~ 1 Ax, x)^0 for all x ^ 0. □ 



In literature, much results have been proved when the coefficient matrix is a diagonally dominant M— matrix. 
We collect some relevant results, and use them to understand the proposed method. 

Definition 1. Let G(A) = (V,E) be the adjacency graph of a matrix A € M. NxN . The matrix A is called 
irreducible if any vertex i £ V is connected to any vertex j G V . Otherwise, A is called reducible. 

Definition 2. A matrix A £ M. NxN is called an M— matrix if it satisfies the following three properties: 

1. an > for i = 1, . . . ,7V 

2. a tj < fori =£3, i,j = l,...,N 

3. A is non-singular and A^ 1 > 

Definition 3. A square matrix A is strictly diagonally dominant if the following holds 

\au\ > ^ l a «l>* = 1,---,N 

and it is called irreducibly diagonally dominant if A is irreducible and the following holds 

\au\ > ^2 = 1,-..,N 

where strict inequality holds for atleast one i. 

A simpler criteria for M— matrix property is given by the following theorem. 

Theorem 4. // the coefficient matrix A is strictly or irreducibly diagonally dominant and satisfies the 
following conditions 

1. an > for i = 1, . . . , N 

2. a,ij <0fori^j,i,j = l,...,N 

then A is an AI— matrix. 

Theorem 5 (|llj). // A G M. NxN is a strictly or irreducibly diagonally dominant M— matrix, then so is the 
coarse grid matrix A c — P T AP. 



Proof. The theorem is proved in □ 

Theorem 6 ( 15j). If the coefficient matrix A is symmetric M— matrix, and let S — LL T be the incomplete 
cholesky factorization, then the fixed point iteration with the error propagation matrix I — S^ 1 A is convergent. 

Theorem [6] above tells us that for an M-matrix, ILU(O) preconditioned method will be convergent by itself. 
However, the convergence is usually slow due to large iteration count with increasing problem size. Combining 
ILU(O) with a coarse grid correction leads to convergence rate which depends mildly on the problem size. 
Following result shows that the inexact factorization is as stable as the exact factorization of the coarse grid 
operator. 

Theorem 7. If the given coefficient matrix A is a symmetric irreducibly diagonally dominant M— matrix, 
and if the inexact coarse grid operator A c is based on incomplete L U factorization as follows 

A c = CHOLINC(A c ), 

where CHOLINC is the incomplete Cholesky factorization, then the construction of A c is atleast as stable as 
the construction of an exact decomposition of A c without pivoting. 

Proof. If the original matrix A is symmetric and irreducibly diagonally dominant M— matrix, then Theorem 
[5] tells us that the coarse grid operator A c obeys the same property. Now, A c being an M— matrix, Theorem 
3.2 in [15 j tells us that A c defined above is as stable as the exact Cholesky factorization of A c . □ 

For a diagonally dominant M-matrix, pivoting is rarely needed. However, pivoting generally improves the 
stability of incomplete LU type factoriations. This is the reason why we use incomplete LU with pivoting, 
namely, ILUT function of MATLAB. Moreover, using ILUT will lead to a method suitable for unsymmetric 
matrices that are not necessarily diagonally dominant. We refer the curious reader to [10] for a small 2x2 
example where pivoting would be essential to obtain stable triangular factorization. 

For problems with jumping coefficients, the ratio of maximum and minimum entry of the coefficient matrix 
can provide some useful bounds as shown in the theorem below. 

Lemma 1 (page 7, [llj). Let A be a symmetric N x N matrix with eigenvalues Ai(A) < ••• < Xn(A) 
arranged in nondecreasing order, then the following holds 

\i{A) < <mini{au} < maxi{au} < Xn{A). 

In particular, if Ay 0, then cond(A) is bounded below by ■ 
Proof. The proof follows by writing the following expression 

Ai(.A) = max^ x ^ = i{x T Ax} , A„(j4) = max\\ x \\ = i{x T Ax}, 
and by setting x as the ith column of the identity matrix /. □ 

In Table Q] we check our estimates on the problems considered in Section [3l We compare the estimate 
obtained above to the that given by the "condest" function of MATLAB. We find that the estimates obtained 
using Theorem Q] are not close, nevertheless, they do indicate the increase in the order of magnitude of the 
condition number with increasing jumps. When a given coefficient matrix is SPD, we may indeed use this as 
a heurestic in determining the quality of the inexact factorization, see last column of Table [TJ Let Uj denote 
the (i, j)th entry of the lower triangular matrix L. For a given SPD matrix, it is easily verified that the 

condition number estimate for the incomplete factorization LL is given by UH^Ojil , F 0r the unsymmetric 

• maxi j{Y2- < hjlji} mi 

matrix, the estimate is given by ' r ' ,J -'; — ; — r- Ihese estimates are cheap to compute and can be used 

as a fault-tolerant mechanism while using inexact factorization, for instance, during inexact coarse grid solve 
as implemented in this paper. 
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Table 1: Comparison matlab condest function with the estimate in Lemma [ljfor exact and inexact fac- 
torization for JUMP3D problems defined in Section [3J Here an is the (i, i)th diagonal entry of the inexact 
coarse grid operator A c factorization with drop tolerance of 10~ 4 



h 


condest 


max 


,{a„} 


max 


,{au} 


rain 


*{««} 


min 


,{5,,} 




K = 10 3 


K = 10 5 


K = 10 3 


K = 10 5 


K = 10 3 


K = 10 5 


1/20 


1.1 x 10 6 


7.8 x 10 6 


7.5 x 10 3 


7.5 x 10 5 


9.3 x 10 3 


9.3 x 10 5 


1/30 


3.7 x 10 6 


7.7 x 10 9 


9.0 x 10 3 


9.0 x 10 5 


1.0 x 10 4 


1.0 x 10 6 


1/40 


6.4 x 10 6 


4.7 x 10 9 


9.0 x 10 3 


9.0 x 10 5 


1.0 x 10 4 


1.0 x 10 6 



In [18) . convergence analysis of perturbed two-grid and multigrid method was done. In the context of domain 
decomposition methods, in \J[, numerical and theoretical analysis suggests the advantages of using inexact 
solves. However, a systematic study of the scalability of inexact coarse grid solve has been missing. 



3 Numerical experiments 



All the numerical experiments were performed in MATLAB with double precision accuracy on Intel core i7 
(720QM) with 6 GB RAM. For comparison, we use the aggregation based AMG (AGMG) software available 
at [19]. The AGMG software is a Fortran mex file, on the other hand, the AMG method introduced in 
this paper, namely, GMG, is written completely in MATLAB. For GMG, the iterative accelerator used is 
GMRES available at [23], the code was changed such that the stopping is based on the decrease of the 
2- norm of the relative residual. For AGMG, GCR method is used Q3j. For both GMRES and GCR, the 
maximum number of iterations allowed is 600, and no restart is done. The stopping criteria is the decrease 
of the relative residual below 10 -7 , i.e., when 

\\b-Ax k \\ 1Q _ 7 



Here b is the right hand side and x k is an approximation to the solution at the kth step. 



3.1 Test cases 

Convection-Diffusion: Our primary test case is the convection-diffusion Equation ([2]) defined on page [I] 
We use the notation DC to indicate that the problems are discontinous. We consider a test case as 
follows 

DC1, 2D case: The tensor k is isotropic and discontinuous. The domain contains many zones of 
high permeability that arc isolated from each other. Let [x] denote the integer value of x. For 
two-dimensional case, we define k(x) as follows: 

( ) - { 103 * ^ 10 * ^ + 1 )' if t 10 * ^ " ° ( m ° d 2 )' 1 = l ' ! 2 ' 
1 1, otherwise. 

The velocity field a is kept zero. We consider a n x n uniform grid where n is the number of 
discrete points along each spatial directions. 
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Table 2: Notations used in tables of numerical experiments 



INOiailOIlS 


IVieaillllg 


h 


Discretization step 


N 


Size of the original matrix 


Nr 
1 * C 


Size of the coarse grid matrix 


its 


Tf py a f inn count 


time 


Tot^l r^PTI time f^etim nlim snivel in seconds 


r f 


rnarspnino' factor 


ME 


AApmnrv ?i Hoppi rion nrnnlpTn 

±V_H^±±±\_J1. y OiLLKJ V_ CI LrlV-'Xl UX UXv-XXl 


Fl 


AGMG returner] flaff 1 see fTm 


SPD 


Svmmptrip nrmit.ivp npnnitp 


NA 


Not applicable 


NC 


Did not converged 




Data not available 


GMG-NO 


Graph based matching for AMG, smoother has ND ordering 


GMG-ND 


Graph based matching for AMG, smoother has natural ordering 


EGMG-ND 


Graph based matching for AMG, exact coarse grid, smoother has ND ordering 


AGMG 


Classical AMG, see [19] 


cf 


Coarsening factor, cf — n c /n for n and n c no. of discrete points 




for uniform fine and coarse grid respectively. 



DC1, 3D case: For three-dimensional case, n(x) is defined as follows: 

( _ ( 10 3 *([10*:r2] + l), if [10*^] = (mod 2) , i = 1,2,3, 
\ 1, otherwise. 

Here again, the velocity field a is kept zero. We consider a n x n x n uniform grid. The jump in 
the diagonal entries of the coefficient matrix is shown in Figure ©• 

DCC1, 2D case: Same as DC1, 2D case above, except that the velocity is non-zero and it is given 
as a(x) = (1000,1000). 

DCC2, 3D case: Same as DC1, 3D case above, except that the velocity a(x) = (1000, 1000, 1000). 

Florida matrix market collection: The list of Florida matrix matrices are shown in Table ([3]). As we 
observe, all the problems are symmetric positive definite steaming from wide range of applications. For 
more on the properties of these matries, the reader is referred to [5]. 



3.2 Comments on numerical results 

Two version of GMG are shown, namely, GMG-NO which stands for GMG where smoother has natural 
ordering, and GMG-ND stands for GMG with smoother having nested dissection ordering. In particular, 
for GMG-ND, we first apply the nested dissection reordering and then the smoother is defined. We observe 
that after applying nested dissection reordering, the smoother which is ILU(0) in our case can be computed 
and applied in parallel. Since, in ILU(0), no pivoting is done, parallelizing ILU(0) after ND ordering leads to 
a parallel smoother. Certainly, not much parallelism is expected when the smoother is applied with natural 
ordering of unknowns. As mentioned before, for the coarse grid solve, we use ILU(10 -4 ) to solve it inexactly. 
We do this inexact solve to see the effect of inexact solve in the iteration count and time. For AGMG, 
Gauss-Seidel smoothing is used, and the choice of the coarse grid is based on the strength of connection 
between nodes. Moreover, in AGMG, usual multilevel recursive approach is followed, i.e., going down the 
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Table 3: Forida matrix market matrices. Here SPD stands for symmetric positive definite. 



Matrices 


Kind 


SPD 


size 


non-zeros 


gyro_m 


Model reduction problem 


Yes 


17361 


340K 


bodyy4 


Structural problem 


Yes 


17546 


121K 


nd6k 


2D/3D problem 


Yes 


18000 


6.8M 


bodvv5 


Structural problem 


Yes 


18589 


128K 


wathenlOO 


Random 2D/3D problem 


Yes 


30401 


471K 


wathcnl20 


Random 2D/3D problem 


Yes 


36441 


565K 


torsionl 


Duplicate optimization problem 


Yes 


40000 


197K 


obstclae 


Optimization problem 


Yes 


40000 


197K 


jnlbrngl 


Optimization problem 


Yes 


40000 


199K 


minsurfo 


Optimization problem 


Yes 


40806 


203K 


gridgena 


Optimization problem 


Yes 


48962 


512K 


crankseg_l 


Structural problem 


Yes 


52804 


10M 


qa8fk 


Acoustic problem 


Yes 


66127 


1M 


cfdl 


Computational fluid dynamics 


Yes 


70656 


1.8M 


finan512 


Economic problem 


Yes 


74752 


596K 


shallow_waterl 


Computational fluid dynamics 


Yes 


81920 


327K 


2cubes_sphere 


Electromagnetic problem 


Yes 


101492 


1.6M 


ThermaLTC 


Thermal problem 


Yes 


102158 


711K 


ThermaLTK 


Thermal problem 


Yes 


102158 


711K 


G2_circuit 


circuit simulation 


Yes 


150102 


726K 


bcsstkl8 


structural problem 


Yes 


11948 


149K 


cbuckle 


structural problem 


Yes 


13681 


676K 


Pres_Poisson 


Computational fluid dynamics 


Yes 


14822 


715K 
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grid heirarchy untill the coarse grid is small enough (or it stops when it does not satisfy certain criteria) to 
be solved exactly. 

We recall here that our aim is to compare the classical multi-grid approach implemented in AGMG with the 
two grid approach of GMG with the following ingredients 

• Coarse grid based on graph matching (call to METIS) 

• ILU(O) is the smoother (built in MATLAB) 

• Coarse grid equation is solved inexactly (using built in ILU(t) routine in MATLAB) 

In Tables ©, ©, ©, ©, ©, ©, CEU]), CU}, CL2]), and ([13J) , we have shown the iteration count and the 
total CPU time: setup time plus solve time, for values of cf ranging from 3 to 8. For the values of cf lying 
between 3 and 8, we can locate the value of cf for which the CPU time is observed to be lowest, to do so, 
we had to vary cf so that we do not miss a value of cf for which the CPU time could be lowest. For 2D 
problems, we find that the AGMG method is several times faster than the GMG based methods. However, 
for 3D problems, the AGMG method does not converge at all. In contrast, GMG methods converges and 
shows mesh independent convergence rates for all values of cf. Considering 2D case first: the least CPU 
time for GMG-NO method is observed for cf = 3. For GMG-ND method, the least CPU time is observed 
for cf = 2.5 except for 2D 1200 x 1200 problem for which the least CPU time occurs for cf = 3. On the 
other hand, for EGMG-NO method, the least CPU time was observed when we had cf = 4 for 800 x 800 
and 1200 x 1200 problem, and the least CPU time for 1200 x 1200 problem was obtained when cf was equal 
to 6. For a smaller size 3D problem of 70 x 70 x 70, the iteration count rather increases rapidly with the 
increasing value of cf. For smaller problems, it seems that a finer coarse grid is necessary. The reason 
for high iteration count may be due to the fact that for a given partial differential equation, the coefficient 
matrix becomes smoother as the resolution of the mesh is increased. In Figures Q] and [2] we find that the 
convergence curve for the respective exact and inexact methods are very similar, this also suggests a similar 
spectrum and probably a similar condition number. To find how close the approximated coarse matrix is to 
the exact coarse operator, in Table 124] we compare the relative error ||LJ7 — LJ7||/||LJ7|| for both natural and 
ND ordering. We find that the relative error is quite small, this is the reason why the direct and indirect 
versions converges in similar number of iterations. But since the inexact methods are relatively fast to build 
and apply, we save significant number of CPU time and storage requirement, see Figure where an inexact 
solve needs about 10 times less storage compared to the exact solver. 

In Tables ([14]), ([15]), ([16]), CLZ]), (gOj, ([21]), and ([22]), we have similar plots for the test case DCC1. 

For this problem, we find that the AGMG method converges much faster compared to the GMG methods for 
both 2D and 3D problems, exception being the 120 x 120 x 120 problem where the method fails to converge. 
However, we remind ourselves that GMG methods are implemented in Matlab, and thus they are expected 
to be faster when they are implemented in lower level languages such as Fortran or C. Thus, our prediction 
is that even for these problems where GMG shows larger CPU time, an implementation in Fortran may 
have the convergence time comparable with that of AGMG. Notably, for this test case, the iteration count 
decreases even more rapidly (compared to test case DC1) with the increase in the size of the problem. 

The rule of thumb in the choice of coarse grid size is to increase the cf value proportionally with the 
increasing size of the problem. For a smaller size problem with discontinous coefficients such as DC1 and 
DCC1, it is good to keep the cf value small. The choice of drop tolerance in ILUT to be 10 -6 worked well 
in practice and we did not encounter any breakdown. However, in case when the jumps are large, the fault 
tolerant mechanism such as the one discussed in the analysis section can be used. 

Finally, in Table (|25[) . we show some experiments with the Florida matrix market problems. We fixed the 
coarse grid size to be 4096. In general, for most of the problems, we find that the two-grid method is faster 
compared to AGMG, exception being, torsionl, obstclae, jnlbrngl, minsurfo, qa8fk, and shallow _water, 
where AGMG is about five times faster. For rest of the problems, GMG methods shows more robustness 
compared to AGMG. Comparing GMG-NO to GMG-ND, we find that GMG-NO converges faster with few 
exceptions. In Table ([26]). we present the numerical results with cf — 2.5. A detailed investigation of the 
best coarse grid size for these problems deserves more effort and detailed study. 
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Table 4: Numerical results for DC1 problem with cf = 2.5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


23 


25.5 


17 


17.4 


16 


38.7 


27 


6.3 


2D 


1/1000 


23 


45.5 


18 


30.3 


18 


82.0 


37 


14.1 




1/1200 


24 


70.8 


20 


48.2 


18 


159.5 


37 


18.2 




1/70 


145 


65.8 


21 


21.0 


14 


163.0 


NC 


NA 


3D 


1/100 


191 


306.5 


26 


141.0 


ME 


NA 


NC 


NA 




1/120 


147 


598.0 


45 


379.2 


ME 


NA 


NC 


NA 



Table 5: Numerical results for DC1 problem with cf = 3 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


28 


24.7 


20 


16.8 


19 


22.8 


27 


6.3 


2D 


1/1000 


29 


50.4 


19 


25.6 


19 


25.5 


37 


14.1 




1/1200 


26 


65.8 


19 


37.6 


20 


70.8 


37 


18.2 




1/70 


165 


66.0 


20 


14.3 


15 


47.5 


NC 


NA 


3D 


1/100 


191 


314.1 


25 


94.2 


16 


839.2 


NC 


NA 




1/120 


165 


417.0 


35 


236.8 


ME 


ME 


NC 


NA 



Table 6: Numerical results for DC1 problem with cf = 3.5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 



its time its time its time its time 





1/800 


33 


30.6 


28 


22.7 


22 


19.9 


27 


6.3 


2D 


1/1000 


36 


50.9 


23 


28.6 


22 


33.8 


37 


14.1 




1/1200 


34 


72.7 


23 


42.2 


22 


54.8 


37 


18.2 




1/70 


153 


55.7 


23 


11.6 


17 


21.5 


NC 


NA 


3D 


1/100 


200 


221.8 


32 


58.6 


23 


254.8 


NC 


NA 




1/120 


153 


339.5 


33 


119.5 


19 


740.2 


NC 


NA 
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Table 7: Numerical results for DC1 problem with cf = 4 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


44 


34.8 


26 


20.6 


24 


19.7 


27 


6.3 


2D 


1/1000 


37 


49.8 


25 


30.1 


25 


33.2 


37 


14.1 




1/1200 


37 


72.3 


28 


51.1 


27 


53.8 


37 


18.2 




1/70 


164 


57.6 


84 


29.5 


55 


26.3 


NC 


NA 


3D 


1/100 


212 


235.8 


27 


44.5 


19 


116.3 


NC 


NA 




1/120 


168 


346.2 


31 


94.8 


21 


326.3 


NC 


NA 



Table 8: Numerical results for DC1 problem with cf = 4.5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


43 


32.7 


27 


20.5 


27 


21.1 


27 


6.3 


2D 


1/1000 


44 


53.9 


28 


33.9 


27 


33.4 


37 


14.1 




1/1200 


41 


74.0 


27 


47.5 


29 


55.6 


37 


18.2 




1/70 


170 


58.7 


89 


31.0 


70 


26.6 


NC 


NA 


3D 


1/100 


205 


215.4 


28 


40.9 


22 


63.3 


NC 


NA 




1/120 


157 


302.3 


34 


83.5 


24 


165.0 


NC 


NA 



Table 9: Numerical results for DC1 problem with cf = 5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


46 


34.4 


32 


24.4 


30 


24.2 


27 


6.3 


2D 


1/1000 


47 


56.8 


32 


37.9 


34 


39.2 


37 


14.1 




1/1200 


51 


86.8 


30 


53.8 


29 


53.6 


37 


18.2 




1/70 


284 


94.2 


199 


63.3 


193 


62.9 


NC 


NA 


3D 


1/100 


206 


280.9 


30 


55.2 


23 


60.5 


NC 


NA 




1/120 


168 


335.3 


40 


92.2 


26 


122.7 


NC 


NA 
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Table 10: Numerical results for DC1 problem with cf = 5.5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


51 


37.0 


33 


23.5 


33 


23.9 


27 


6.3 


2D 


1/1000 


52 


61.3 


34 


37.6 


33 


37.8 


37 


14.1 




1/1200 


50 


84.2 


33 


54.3 


32 


55.3 


37 


18.2 




1/70 


469 


148.1 


162 


50.9 


160 


51.5 


NC 


NA 


3D 


1/100 


219 


222.9 


148 


146.2 


109 


117.0 


NC 


NA 




1/120 


221 


393.7 


42 


83.4 


28 


95.4 


NC 


NA 



Table 11: Numerical results for DC1 problem with cf = 6 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


55 


41.4 


36 


25.2 


36 


25.2 


27 


6.3 


2D 


1/1000 


56 


65.8 


36 


40.0 


36 


40.0 


37 


14.1 




1/1200 


55 


92.9 


36 


56.1 


36 


56.8 


37 


18.2 




1/70 


568 


183.2 


316 


98.9 


354 


115.9 


NC 


NA 


3D 


1/100 


208 


213.5 


103 


100.4 


94 


101.1 


NC 


NA 




1/120 


167 


308.1 


43 


83.7 


41 


95.5 


NC 


NA 



Table 12: Numerical results for DC1 for 2D and 3D problems with cf = 7 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


68 


41.9 


51 


25.8 


41 


27.2 


27 


6.3 


2D 


1/1000 


68 


73.5 


41 


36.9 


41 


40.9 


37 


14.1 




1/1200 


73 


112.0 


41 


59.4 


41 


60.0 


37 


18.2 




1/70 


NC 


NA 


462 


140.9 


277 


86.6 


NC 


NA 


3D 


1/100 


566 


542.2 


266 


249.9 


255 


239.0 


NC 


NA 




1/120 


173 


314.5 


72 


127.5 


57 


114.5 


NC 


NA 
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Table 13: Numerical results for DC1 for 2D and 3D problems with cf = 8 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


77 


48.3 


48 


30.9 


48 


30.9 


27 


6.3 


2D 


1/1000 


76 


74.6 


46 


46.8 


46 


47.2 


37 


14.1 




1/1200 


77 


108.6 


46 


67.0 


45 


65.9 


37 


18.2 




1/70 


NC 


NA 


360 


120.2 


381 


125.1 


NC 


NA 


3D 


1/100 


NC 


NA 


454 


441.4 


440 


426.5 


NC 


NA 




1/120 


486 


831.2 


186 


319.5 


187 


326.2 


NC 


NA 



Table 14: Numerical results for DCC1 problem with cf = 2.5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


54 


45.3 


49 


36.6 


18 


14.2 


27 


6.3 


2D 


1/1000 


75 


90.4 


60 


73.7 


19 


83.0 


37 


14.1 




1/1200 


51 


101.4 


40 


61.1 


21 


165.6 


37 


18.2 




1/70 


100 


50.3 


14 


17.4 


14 


161.6 


16 


3.4 


3D 


1/100 


139 


223.8 


15 


100.8 


ME 


NA 


15 


8.8 




1/120 


123 


471.3 


15 


264.2 


ME 


NA 


NC 


NA 



Table 15: Numerical results for DCC1 problem with cf = 3 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


28 


24.7 


20 


16.8 


19 


22.8 


27 


6.3 


2D 


1/1000 


29 


50.4 


19 


25.6 


19 


25.5 


37 


14.1 




1/1200 


26 


65.8 


19 


37.6 


20 


70.8 


37 


18.2 




1/70 


113 


48.7 


15 


10.6 


15 


46.9 


16 


3.4 


3D 


1/100 


153 


207.3 


17 


53.2 


17 


681.8 


15 


8.8 




1/120 


135 


337.8 


17 


119.7 


ME 


ME 


NC 


NA 
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Table 16: Numerical results for DCC1 problem with cf = 3.5 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


47 


36.5 


32 


25.1 


27 


24.7 


27 


6.3 


2D 


1/1000 


52 


64.0 


41 


44.4 


28 


42.9 


37 


14.1 




1/1200 


44 


80.8 


30 


57.8 


27 


65.5 


37 


18.2 




1/70 


129 


48.6 


16 


8.7 


16 


21.0 


16 


3.4 


3D 


1/100 


171 


194.2 


18 


36.5 


18 


248.0 


15 


8.8 




1/120 


150 


325.0 


18 


78.3 


18 


744.7 


NC 


NA 



Table 17: Numerical results for DCC1 for 2D and 3D problems with cf = 4 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


57 


43.8 


32 


24.8 


31 


26.2 


27 


6.3 


J2D 


1/1000 


53 


56.4 


38 


41.7 


35 


43.7 


37 


14.1 




1/1200 


52 


81.3 


39 


61.9 


38 


66.1 


37 


18.2 




1/70 


138 


52.0 


26 


12.4 


26 


15.8 


16 


3.4 


3D 


1/100 


182 


191.2 


19 


31.1 


19 


109.9 


15 


8.8 




1/120 


170 


308.8 


20 


61.2 


19 


312.9 


NC 


NA 
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Table 18: Numerical results for DCC1 for 2D and 3D problems with cf = 4.5 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


if o 
ItS 


time 


if o 

its 


time 


if a 

its 


time 




1/800 


76 


53.9 


39 


27.6 


37 


27.5 


27 


6.3 


J2D 


1/1000 


74 


82.2 


39 


43.9 


38 


44.4 


37 


14.1 




1/1200 


64 


112.6 


39 


64.1 


37 


65.6 


37 


18.2 




1/70 


151 


53.5 


38 


16.1 


38 


18.1 


16 


3.4 


3D 


1/100 


210 


234.6 


21 


31.1 


21 


66.0 


15 


8.8 




1/120 


180 


363.6 


22 


61.5 


22 


169.2 


NC 


NA 



Table 19: Numerical results for DCC1 for 2D and 3D problems with cf = 5 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


88 


60.95 


45 


29.0 


43 


28.1 


27 


6.1 


2D 


1/1000 


80 


77.2 


48 


48.6 


49 


50.8 


37 


13.2 




1/1200 


81 


114.0 


43 


64.4 


42 


64.7 


37 


18.3 




1/70 


169 


59.4 


60 


22.8 


60 


24.0 


16 


3.4 


3D 


1/100 


224 


219.8 


22 


29.7 


22 


45.3 


15 


8.8 




1/120 


195 


340.35 


23 


56.8 


23 


111.0 


NC 


NA 



4 Conclusion 

We have proposed a two grid approach GAGMG with following ingredients 

• Coarse grid based on graph matching 

• ILU(0) is the smoother with natural or nested dissection reordering 

• Coarse grid equation is solved inexactly 

We compared out approach with the classical AGMG scheme. On comparison, wc found that the new strategy 
seems to be robust with a very modest coarse grid size which is further solved cheaply by performing an 
inexact solve. One of the aim of this work was to provide a practical, easy to implement, yet robust two-grid 
methods. 

We have tried only the sequential version of our method, in future, we would like to implement the method 
in parallel with a parallel inexact solve strategy. 
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Table 20: Numerical results for DCC1 for 2D and 3D problems with cf = 5.5 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


107 


73.0 


50 


33.7 


48 


32.7 


27 


6.1 


2D 


1/1000 


100 


107.8 


50 


55.3 


49 


52.7 


37 


13.2 




1/1200 


90 


148.7 


48 


74.2 


47 


72.8 


37 


18.3 




1/70 


282 


100.6 


65 


24.5 


70 


26.7 


16 


3.4 


3D 


1/100 


240 


260.6 


44 


52.0 


45 


63.2 


15 


8.8 




1/120 


223 


405.1 


24 


56.9 


24 


84.5 


NC 


NA 



Table 21: Numerical results for DCC1 for 2D and 3D problems with cf = 6 using GMRES(30) 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


132 


79.7 


56 


36.7 


56 


37.0 


27 


6.3 


2D 


1/1000 


117 


112.4 


55 


56.1 


54 


55.2 


37 


14.1 




1/1200 


111 


152.5 


55 


80.9 


53 


77.9 


37 


18.2 




1/70 


261 


88.9 


90 


32.2 


88 


31.5 


16 


3.4 


3D 


1/100 


256 


250.4 


41 


47.1 


41 


51.3 


15 


8.8 




1/120 


225 


384.6 


26 


61.7 


27 


78.9 


NC 


NA 



Table 22: Numerical results for DCC1 problem with cf = 7 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


184 


112.3 


67 


42.7 


65 


42.0 


27 


6.3 


2D 


1/1000 


173 


162.6 


69 


68.1 


67 


67.5 


37 


14.1 




1/1200 


397 


530.4 


67 


97.6 


64 


95.2 


37 


18.2 




1/70 


351 


117.5 


109 


32.6 


109 


36.3 


16 


3.4 


3D 


1/100 


301 


296.2 


80 


82.8 


78 


81.8 


15 


8.8 




1/120 


257 


440.0 


29 


69.7 


29 


74.5 


NC 


NA 
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Table 23: Numerical results for DCC1 problem with cf = 8 using GMRES(30). 



matrix h GMG-ND GMG-NO EGMG-NO AGMG 







its 


time 


its 


time 


its 


time 


its 


time 




1/800 


266 


171.3 


82 


51.5 


83 


54.2 


27 


6.3 


2D 


1/1000 


229 


225.6 


84 


81.0 


83 


79.8 


37 


14.1 




1/1200 


218 


297.7 


81 


112.4 


78 


107.8 


37 


18.2 




1/70 


502 


161.0 


123 


41.0 


115 


37.9 


16 


3.4 


3D 


1/100 


449 


445.9 


105 


102.4 


104 


101.8 


15 


8.8 




1/120 


294 


517.2 


57 


109.8 


57 


112.2 


NC 


NA 



Table 24: Comparison of exact and inexact coarse operators. Here NO stands for natural ordering and ND 
stands for nested dissection ordering 



Problem 


1/h 


\\LU-LU\\ f N() 
\\LU\\ 101 1NU 


\\LU-LU\\ r ND 

||£E7|| IOT 




1/30 


7.5e" 5 


7.8e" 5 


JUMP3D 


1/40 


8.3e~ 5 


7.2e~ 5 




1/50 


7.0e~ 5 


Lie -4 



Figure 1: Convergence curve for Pres_Poisson for exact and inexact coarse grid solves. CPU time indicated 
inside small box. 
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Table 25: Numerical results for Florida matrix market collection. 



matrix N c GMG-ND GMG-NO AGMG 







its 


time 


its 


time 


its 


time 


gyto_m 


4096 


111 


5.2 


137 


7.0 


> 600 


NA 


bodyy4 


4096 


54 


1.8 


19 


0.4 


> 600 


NA 


nd6k 


4096 


NC 


NA 


89 


21.3 


MEM 


NA 


bodyy5 


4096 


115 


4.6 


30 


0.7 


- 


- 


wathenlOO 


4096 


12 


1.4 


9 


0.7 


11 


4.1 


wathenl20 


4096 


12 


1.3 


9 


0.9 


11 


35.0 


torsionl 


4096 


13 


0.99 


9 


0.6 


6 


0.1 


obstclae 


4096 


13 


0.9 


9 


0.6 


6 


0.1 


jnlbrngl 


4096 


22 


1.5 


11 


0.7 


11 


0.1 


minsurfo 


4096 


16 


1.0 


11 


0.6 


8 


0.1 


gridgena 


4096 


310 


81.4 


189 


34.0 


> 600 


NA 


crankseg_l 


4096 


60 


22.3 


76 


26.2 


334 


69.1 


qa8fk 


4096 


11 


4.9 


12 


3.6 


15 


1.3 


cfdl 


4096 


145 


45.1 


127 


32.0 


MEM 


NA 


finan512 


4096 


7 


1.8 


7 


1.2 


4 


106.0 


shallow _waterl 


4096 


6 


1.3 


6 


0.8 


4 


0.1 


2cubes_sphere 


4096 


6 


3.9 


6 


2.5 


7 


8.6 


ThermaLTC 


4096 


6 


2.03 


7 


1.4 


MEM 


NA 


ThermaLTK 


4096 


22 


3.6 


23 


3.5 


80 


5.0 


G2_circuit 


4096 


32 


7.7 


24 


4.2 


91 


5.9 


bcsstkl8 


4096 


232 


11.6 


111 


3.7 


> 600 


NA 


cbuckle 


4096 


41 


2.3 


62 


3.0 


493 


18.7 


Pres_Poisson 


4096 


21 


1.7 


24 


1.7 


24 


16.8 



Figure 2: Convergence curve for DC3D 50 x 50 x 50 for exact and inexact coarse grid solves. CPU time 
indicated inside small box. 
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Table 26: Numerical results for Florida matrix market collection with cf = 2.5 



matrix GMG-ND EGMG-ND GMG-NO EGMG-NO AGMG 





its 


time 


its 


time 


its 


time 


its 


time 


its 


time 


gyto_m 


271 


4.1 


319 


4.6 


465 


6.9 


562 


7.3 


NC 


NA 


bodyy4 


56 


0.9 


56 


0.9 


19 


0.3 


19 


0.4 


NC 


NA 


nd6k 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 


ME 


NA 


bodyy5 


135 


1.8 


137 


1.8 


31 


0.6 


31 


0.6 


1 


245.4 


wathcnlOO 


12 


1.0 


12 


1.1 


9 


0.6 


9 


0.7 


11 


4.3 


wathcnl20 


12 


1.2 


12 


1.3 


9 


0.8 


9 


0.8 


11 


7.0 


torsionl 


14 


0.9 


14 


1.0 


10 


0.5 


10 


0.6 


6 


0.1 


obstclae 


14 


0.9 


14 


1.0 


10 


0.5 


10 


0.6 


6 


0.1 


jnlbrngl 


25 


1.3 


24 


1.3 


12 


0.6 


12 


0.6 


11 


0.2 


minsurfo 


18 


1.0 


18 


1.2 


11 


0.5 


12 


0.6 


7 


0.2 


gridgena 


NC 


NA 


NC 


NA 


307 


10.3 


308 


9.2 


NC 


NA 


crankseg_l 


74 


21.4 


109 


31.6 


99 


25.5 


87 


29.0 


468 


96.0 


qa8fk 


12 


4.9 


11 


5.8 


11 


3.6 


10 


4.4 


15 


1.3 


cfdl 


439 


37.0 


442 


38.3 


259 


23.0 


255 


30.5 


ME 


NA 


finan512 


6 


1.8 


6 


1.9 


7 


1.2 


7 


1.3 


4 


106.0 


shallow.waterl 


6 


1.4 


6 


2.3 


6 


0.9 


6 


1.7 


4 


0.1 


2cubcs_sphcrc 


6 


4.4 


7 


8.6 


5 


3.0 


5 


15.2 


7 


8.6 


ThermaLTC 


6 


2.2 


6 


2.5 


6 


1.5 


6 


1.9 


ME 


NA 


ThermaLTK 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 


G2_circuit 


28 


7.3 


21 


7.7 


21 


4.7 


15 


5.0 


74 


4.8 


bcsstkl8 


NC 


NA 


NC 


NA 


166 


1.9 


169 


2.0 


Fl 


NA 


cbuckle 


54 


1.5 


54 


1.5 


247 


4.0 


198 


3.7 


Fl 


NA 


Pres_Poisson 


24 


1.0 


22 


1.0 


26 


1.0 


26 


1.0 


24 


21.2 



Figure 3: Comparison of no. of nonzeros for exact and inexact coarse grid solves for JUMP3D 
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Figure 4: Jump in the diagonal entries of DC3D 30 x 30 x 30 matrix when k(x) values are kept as 10 3 
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